Skip to content

Conversation

@addaleax
Copy link
Collaborator

@addaleax addaleax commented Dec 3, 2024

This should hopefully address AWS auth test flakiness on s390x, where DNS queries can take multiple seconds each due to the host CI setup.

Example strace that indicates a 5 second delay for a DNS query: https://docs.google.com/document/d/1jWxKTZWma5nN5I3a9iTANnulzm_9GlLanEUbf9q9vh0/edit?tab=t.0

This should hopefully address AWS auth test flakiness on s390x,
where DNS queries can take multiple seconds each due to the host CI
setup.
Copy link
Collaborator

@nirinchev nirinchev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something we can work on together with devprod to reduce DNS lookup times on these hosts? If you've already talked to them, might be good to add a comment in the code that summarizes the motivation/intricacies of this setup, as well as any steps we might want to take in the future. To be clear, this seems like a reasonable bandaid, but ideally, we'd want to improve the latency to avoid wasting CI resources.

@addaleax
Copy link
Collaborator Author

addaleax commented Dec 3, 2024

@nirinchev Oh yeah, I was chatting with Thomas from the DevProd team who also helped with providing debug access to a host. I don't think there's a known root cause here – but they're eager to investigate this on their side.

@nirinchev
Copy link
Collaborator

Cool, if they have a ticket tracking this, mind if you link to it in a code comment as it'll provide valuable context for future readers.

@addaleax
Copy link
Collaborator Author

addaleax commented Dec 3, 2024

@nirinchev I don't think there's a ticket right now. We do have:

We recently got honeycomb metrics functional on this platform, so hopefully stuff like that will start showing up in traces

so that's likely going to help the DevProd team with keeping an eye on things, and if we feel that this is really an issue that is significant enough to require special attention, we can always open a new ticket with them. But I also have to be honest and say that this problem doesn't seem bad enough that it requires extra attention at this point – if the tests start timing out with the new 1 minute limit I'd be a lot more concerned.

@nirinchev
Copy link
Collaborator

@addaleax fair enough - I agree it's sufficiently unimportant that creating a ticket is not something we need to do at this point.

@addaleax addaleax merged commit 6c9a9b4 into main Dec 4, 2024
142 of 146 checks passed
@addaleax addaleax deleted the 1924-dev branch December 4, 2024 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants